Adam Guenoun, Indira Martinez, Nicholas Solis
Within this analysis, we’ll investigate factors correlated to diabetes. With a data set of 100,000 people, this investigation allows us to display relations between ages, HbA1c levels, smoking history, and glucose levels. With a wide range of data points, we begin to question if there are trends within this data that match our general understanding of diabetes. Our goal is to asses which of the 9 variables play a stronger role to the development of diabetes and if we can prove trends to better support our assumptions of this data. Through data visualization, chart analysis, and numerical analysis we will be able to present this data to convince a general audience of the important factors that contribute to diabetic trends.
library(tidyverse) ## Loaded for dplyr
library(ggplot2) ## Loaded for plotting
library(plotly) ## Loaded for interactive plots
library(readr) ## Loaded to read in data
library(knitr) ## Loaded to compute and display data
library(scales) ## Loaded to scale data | gender | age | hypertension | heart_disease | smoking_history | bmi | HbA1c_level | blood_glucose_level | diabetes |
|---|---|---|---|---|---|---|---|---|
| Female | 80 | 0 | 1 | never | 25.19 | 6.6 | 140 | 0 |
| Female | 54 | 0 | 0 | No Info | 27.32 | 6.6 | 80 | 0 |
| Male | 28 | 0 | 0 | never | 27.32 | 5.7 | 158 | 0 |
| Female | 36 | 0 | 0 | current | 23.45 | 5.0 | 155 | 0 |
| Male | 76 | 1 | 1 | current | 20.14 | 4.8 | 155 | 0 |
| Female | 20 | 0 | 0 | never | 27.32 | 6.6 | 85 | 0 |
For our first plot we filtered our data to categorize males and females as diabetic, pre-diabetic, and normal based on blood sugar levels(HbA1c).
| gender | diabetes | HbA1c_level | HbA1c_category |
|---|---|---|---|
| Female | 0 | 6.6 | Diabetic ≥ 6.5% |
| Female | 0 | 6.6 | Diabetic ≥ 6.5% |
| Male | 0 | 5.7 | Prediabetic 5.7% - 6.4% |
| Female | 0 | 5.0 | Normal < 5.7% |
| Male | 0 | 4.8 | Normal < 5.7% |
Our density plot supports the correlation between aging and disease prevalence between two different groups, the first group are individuals with all three conditions and the second group are individuals who are free from all three these conditions.
Older individuals (50+) are much more likely to have diabetes, heart disease, and hypertension.
Younger individuals (under 50) are more likely to be free of these conditions.
| age | diabetes | heart_disease | hypertension | group |
|---|---|---|---|---|
| 57 | 1 | 1 | 1 | Diabetes, H.D, and Hyp. |
| 62 | 1 | 1 | 1 | Diabetes, H.D, and Hyp. |
| 62 | 1 | 1 | 1 | Diabetes, H.D, and Hyp. |
| 67 | 1 | 1 | 1 | Diabetes, H.D, and Hyp. |
| 72 | 1 | 1 | 1 | Diabetes, H.D, and Hyp. |
| age | heart_disease | diabetes | hypertension | group |
|---|---|---|---|---|
| 54 | 0 | 0 | 0 | Free of Diabetes, H.D, and Hyp. |
| 28 | 0 | 0 | 0 | Free of Diabetes, H.D, and Hyp. |
| 36 | 0 | 0 | 0 | Free of Diabetes, H.D, and Hyp. |
| 20 | 0 | 0 | 0 | Free of Diabetes, H.D, and Hyp. |
| 79 | 0 | 0 | 0 | Free of Diabetes, H.D, and Hyp. |
The graph below is separated by whether or not a person has hypertension. With the comparison of BMI as the range, it’s seen that majority of people with and without hypertension lie within a BMI range of 25-29. Notice that for people with hypertension, the desnity population above the red line is greater than that of people without hypertension; indicating that there’s a larger of population of people with hypertension that have a larger BMI
| gender | age | hypertension | heart_disease | smoking_history | bmi | HbA1c_level | blood_glucose_level | diabetes |
|---|---|---|---|---|---|---|---|---|
| Female | 80 | 0 | 1 | never | 25.19 | 6.6 | 140 | 0 |
| Female | 54 | 0 | 0 | No Info | 27.32 | 6.6 | 80 | 0 |
| Male | 28 | 0 | 0 | never | 27.32 | 5.7 | 158 | 0 |
| Female | 36 | 0 | 0 | current | 23.45 | 5.0 | 155 | 0 |
| Male | 76 | 1 | 1 | current | 20.14 | 4.8 | 155 | 0 |
Comparing blood glucose levels by diabetes status only for individuals from ages 3-80. We selected this age range because here we have the youngest and oldest people with diabetes.
For individuals with no diabetes, a blood sugar level can range from 120–140 mg/dL after 2-3 hours of eating.
For individuals with diabetes, a blood sugar level can range from 200+ mg/dL after 2-3 hours of eating.
| age | diabetes | blood_glucose_level |
|---|---|---|
| 80 | No Diabetes | 140 |
| 54 | No Diabetes | 80 |
| 28 | No Diabetes | 158 |
| 36 | No Diabetes | 155 |
| 76 | No Diabetes | 155 |
Here we will compare the age and BMI of individuals with heart disease only and individuals with diabetes only. We can observe that there is a common BMI of 27.32 for both conditions.
An overweight BMI ranges from 25.0 -29.9 , with this information we conclude that many individuals from both conditions are categorized as overweight.
| age | bmi | diabetes | condition |
|---|---|---|---|
| 44 | 19.31 | 1 | Diabetes Only |
| 67 | 27.32 | 1 | Diabetes Only |
| 50 | 27.32 | 1 | Diabetes Only |
| 73 | 25.91 | 1 | Diabetes Only |
| 53 | 27.32 | 1 | Diabetes Only |
| age | bmi | heart_disease | condition |
|---|---|---|---|
| 80 | 25.19 | 1 | Heart Disease Only |
| 76 | 20.14 | 1 | Heart Disease Only |
| 72 | 27.94 | 1 | Heart Disease Only |
| 67 | 27.32 | 1 | Heart Disease Only |
| 77 | 32.02 | 1 | Heart Disease Only |
Each person within this scale has heart disease. Here a comparison is made between declared underweight and overweight people, grouped by sex, based on a BMI scale. There’s a significant increase in population percentage for those who are considered overweight and that have heart disease. With visual aid, it can be concluded that as weight increases, chances of heart disease will increase.
The data here is heavily dependent on BMI scale. It is important to note that BMI is not really a great determination for those who have diabetes, but there is a general trend within the data that people who have a BMI over 30 are more likely to be diabetic.
This depicts the different categories of HbA1c levels and their relation to patients hypertension status
This graph shows the population density of men based on diabetes status, based on age range
This graph shows the population density of women based on diabetes status, based on age range
In the smoking data there are 6 unique values
The total amount of people who fall into each category is as follows;
Never: 35095
Not current: 6447
Former: 9352
Current: 9286
Ever: 4004
There is quite a sizable amount of people in the No info category.
The total number of people in the dataset is 100000. To help clean up the data, we can filter ‘No Info’ people out. When we do that we get 64184.
The data was then summarized to gather the total counts belonging to each smoking category and further grouped by diabetes status.
A percentage per smoking category with diabetes is then calculated dividing the count with diabetes by the total count in each smoking category.
Now we can graph the relationship between smoking and diabetes as separated by smoking category.
When looking at the data, there is a spike in former smokers risk for the three health issues discussed.
Questions arose; If you never quit smoking, do you maintain similar risk to people who never have smoked? What changes from current to former smokers?
To have been classified a former smoker, you at one point had to have been a current smoker; which signifies a change in age from current to former smoking status.
This lead to me graphing the density of smoking category by age.
The data shows a increase in former smokers with a simultaneous decrease in current smokers around the 40-60 yr age range.
We can compare this with the density of people with diabetes, hypertension, and heart disease across all ages to see if there are similar spikes.
Former smokers are at a higher risk for diabetes, hypertension, and heart disease
As people get older, their risk for disease increases
Although current smokers risk is not reflected through our percentage graph, further digging shows that around the age where current smokers decrease and former smokers increase is around the same age range that the risk for disease increases
Within this analysis we’ve scaled: - Diabetes status based by age and sex - Relations towards BMI and Hypertension status - Shared populations between Diabetes, heart disease, and hypertension status - HbA1c trends among sexes - Glucose levels dependent by diabetes status - Smoking status and its relation to Diabetes, heart disease, and hypertension
From this we were able to to effectively show and defend our general assumptions about diabetes and its co factors
Limitations:
This data does not indicate whether our diabetic patients are type 1 or type 2 diabetic
This data is generated from various studies, making up the 100,000 patients
Blood glucose levels were not detailed on how this data was retrieved